Morphological Parsing
   HOME

TheInfoList



OR:

Morphological parsing, in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
, is the process of determining the
morphemes A morpheme is the smallest meaningful constituent of a linguistic expression. The field of linguistic study dedicated to morphemes is called morphology. In English, morphemes are often but not necessarily words. Morphemes that stand alone a ...
from which a given word is constructed. It must be able to distinguish between orthographic rules and morphological rules. For example, the word 'foxes' can be decomposed into 'fox' (the stem), and 'es' (a suffix indicating plurality). The generally accepted approach to morphological parsing is through the use of a
finite state transducer A finite-state transducer (FST) is a finite-state machine with two memory ''tapes'', following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape ...
(FST), which inputs words and outputs their stem and modifiers. The FST is initially created through algorithmic parsing of some word source, such as a dictionary, complete with modifier markups. Another approach is through the use of an indexed lookup method, which uses a constructed
radix tree In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the only child is merged with its parent. The resul ...
. This is not an often-taken route because it breaks down for morphologically complex languages. With the advancement of
neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
in natural language processing, it became less common to use FST for morphological analysis, especially for languages for which there is a lot of available
training data In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from ...
. For such languages, it is possible to build character-level
language models A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...
without explicit use of a morphological parser.Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov
"Enriching Word Vectors with Subword Information"
/ref>


Orthographic

Orthographic rules are general rules used when breaking a word into its
stem Stem or STEM may refer to: Plant structures * Plant stem, a plant's aboveground axis, made of vascular tissue, off which leaves and flowers hang * Stipe (botany), a stalk to support some other structure * Stipe (mycology), the stem of a mushro ...
and
modifiers In linguistics, a modifier is an optional element in phrase structure or clause structure which ''modifies'' the meaning of another element in the structure. For instance, the adjective "red" acts as a modifier in the noun phrase "red ball", provi ...
. An example would be: singular English words ending with -y, when pluralized, end with -ies. Contrast this to morphological rules which contain corner cases to these general rules. Both of these types of rules are used to construct systems that can do morphological parsing.


Morphological

Morphological rules are exceptions to the orthographic rules used when breaking a word into its stem and modifiers. An example would be while one normally pluralizes a word in English by adding 's' as a suffix, the word 'fish' does not change when pluralized. Contrast this to orthographic rules which contain general rules. Both of these types of rules are used to construct systems that can do morphological parsing. Various models of natural morphological processing have been proposed. Some experimental studies suggest that monolingual speakers process words as wholes upon listening to them, while their late bilinguals peers break words down into their corresponding morphemes, because their lexical representations are not as specific, and because lexical processing in the second language may be less frequent than processing the mother tongue. Applications of morphological processing include machine translation, spell checker, and information retrieval.


References

Grammar
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
Natural language parsing {{comp-ling-stub